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(54) Apparatus and method for extracting management information from image 



(57) A management information extraction appara- 
tus learns the structure of ruled lines of a document and 
the position of user-specified management information 
such as a title, etc. during a form learning process, and 
stores them in a layout dictionary (23). During the oper- 
ation, the structure of the ruled lines extracted from an 
image of an input document is matched with that of the 
document in the layout dictionary. Then, position infor- 
mation in the layout dictionary is referred to, and the 
management information is extracted from the input 
document 
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rf«, J! 6 Pf ! Sent inVenti0 " rela,8S ,0 3 SyStem f0r convertin 9 documents and drawings into image data through an input 
Tan *™!? VT^' ^ ^ mana 9 ement information to the image data, and accumulating resultam S 
^^SZ^T' B 01 — ,in6S ^ a " - - * • "— ot per: 

on bS HT faal me ! h ° d ° f S,0nn9 info(Tnatio " on P a P°< has been switched to a method of storing data 
on electronic med* For example, an electronic filing system converts documents stored on paper into document 
.mages by an opto-electr.cal converter such as an image scanner, etc. and stores the converted document .SgeTon 
document images ^ ^ ^ informati °" such as a for retrieval added to the converted 

in a m^TTT ar t St ° red 33 ima9S data in the above described method, a larger disk capacity is required than 
HowT^r?h T h l^ 9 ' 8 documenls are stored a «°' being encoded in a character recognition technology 

ta™no da t ^ other than^hf t b " ~" y ,0 " OWed * 3 h ' gh pfOCeSS Speed ' and P ict "° s and « ab '°* 

S^aSfcSl il^m^"? ^ S, ° red 35 °" the ° ,her hand ' the S, ° red inf °™ a tion should be retrieved 
convLSc , 9 '"formation such as a keyword, numbers, etc. together with document images. The 

■^^2 * mU * fc eff °H "* ^ i0 aSSi9ni " 9 3 keyWOrd ' 306 d0 not bri "9 —-friendly technology, 

be Jt^T ? t ? awkwardf1ess of conventional systems, the title of a document can be assumed to 

A, ,h h y , 6X,raCted ' r6C ° 9niZed 33 CharaC,efS ' and encoded for storaae "»* document images. 
30 s 1 P I k he h Speed °, f rec °9n.zing characters is up to several tens of characters per second, and it takes about 

^^ILn^? 1 7, m ' nUteS t0 PrOC6SS 3 n0rmal d0CUment Pa9e (-PP^imately 21 cm x 29.5cm). Therefore 
m l« h re !° 9m2e a " characters ° f ™ ^ire document, but to first extract necessary titles from the 

images of the document and then recognize them. 

The conventional technology of extracting a part of a document, for example, a title of the document from a doc- 
r^ n A on!i e ^ b ^ ,ned by readh19 the document trough an opto-electrical converter is described in TITLE EXTRACT- 
ING APPARATUS FOR EXTRACTING TITLE FROM DOCUMENT IMAGE AND METHOD THEREOF" (US Patent 

FIG Z sToSt ' ' f Pat6nt APP ' iCali0n H7 " 341983 ,Med by *• APP'icant of the P °eslt L^n 

FIG. 1A shows the principle of the title extracting apparatus. 

The title extracting apparatus shown in FIG. 1 A comprises a character area generation unit 1 a character strina 

ZS- i P 7 ^f mentS ' 3 Partia ' Patt6m SUCh 38 3 Part ° f a Charac,er ' etc - from a document fmage input 
through a scanner> etc . Then, lt extracts (generates) a character afea jnt 9 P 

aria TSSZTT™?? * in,e9ra,eS 3 P ' Ura ' ity ° f Ch " aotor afeaS and 6X, ™ ts <9 enera,es > a character Sg 
a7.u ♦ faction unit 3 extracts as a title area a character string area which is probably a title 

i I T°' tltle extraction unit 3 utilizes notal >le Points such as a top and center position a character size 

° f me d ° CUment ' ^ Und9rlined re P rese "tation, etc. as the prob^lt of a We a ea Z 

the mt 2LT iT^T ? 3 SC u° r t f ° r 9aCh ° f ,h6 ChafaCter String areas to fina 'V obtain a plurality of candidal 

eXral*^ d " ° t ,r ° m hi9heSt SC ° re ,0 the '° WeSt ° na ,he above described Process^title areas can be 
extracted from documents containing no tables. 

of the conHi°ion!,f h th nd ' T 3 ^l"™* C ° n ' ainS 3 ^ li,le extraction unit 3 extracts a title area in consideration 
a eal n °h . S p T °' Charac,ers after the character ^ring area generation unit 2 extracts a character string 

area ln the table. For example, the number of characters indicating the name of an item implying the existence o e 
title is comparatively small such as 'Subject', 'Name", etc. The number of characters forming a character stZrepre 

h 18 P ? b ! b,y ' arge SUCh 83 felatin9 «° Thus ' a character s tring which is probabfyTSt |" > Z 
be detected from adjacent character strings by utilizing the number of characters in the character strings 

th» ah ° Wt T' J°f re 3 far9e " Umber ° f lablB *«nattad documents using ruled lines such as slips, etc Therefore 

Ti^acZ^V^T M ,echno,09y has the problem that there is me probability that 3 can be ~ 

extra^o^hv T"? W V S W [ itt8n a ' the CentCr ° f ar ° Und the bo,tom in a teble - the title -"ay not be correctly 
Sumn Y t ^ "P Chara ° ,er Stnn9S fr ° m thS t0p by priority " F^hermore, as shown in FIG 1B, an approval 

manager, sub-manager', -person in charge", etc. in the approval column 11, then these character strings are^racted 
by priority, thereby failing in correctly extracting the title ^ extracted 

on thtrSr^nn a COm f b i na, :° n ° f an ttem name 12 and a title 13, a title may be written below the item name 1 2, not 
recoon KS"", *f T™ ^ *" 8 ^ Xhe < &Mn& P ° sitions of ,he " em nama a " d the title cannot be 

heoc^^^ 

the posrt,on of the rtem name. When a document contains two tables, the title may be located somewhere in a smalle!- 
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table. ' 

Since a document containing tables can be written in various formats, the probability ot a title depends on each 
document, and the precision of extracting a title in a table is lowered. If the state of an input document image is not 
good, the extraction precision is furthermore lowered. 
5 in an electronic filing system, an extracted title area is character-recognized by an optical character reader (OCR) 

to generate a character code and add it to the image as management information. Thus, the image in a database can 
be retrieved using a character code. 

In this case, there is no problem if the character string in a title area is readable by an OCR. However, if a background 
shows a textured pattern or characters are designed fonts, then the current OCR cannot recognize a character string. 
10 Therefore, in this case, management information cannot be added to an image. 

The present invention aims at providing an apparatus and method of extracting appropriate management infor- 
mation for use in managing an image in a document in various formats, and an apparatus and method of accumulating 
images according to the management information. 

An image management system having the management information extraction apparatus and the image accurnu- 
i£ lation apparatus according to the present invention includes a user entry unit, a computation unit, a dictionary unit, a 
comparison unit, an extraction unit, a storage unit, a group generation unit, and a retrieval unit. 

According to the first aspect of the present invention, the computation unit computes the position of the manage- 
ment information contained in an arbitrary input image according to the position information about the position of a 
ruled line relative to the outline portion of a table area contained in the input image. The extraction unit extracts the 
20 management information from the input image based on the position computed by the computation unit. 

In the second aspect of the present invention, the dictionary unit stores the features of the structures of the ruled 
lines of one or more table forms, and the position information about the management information in each of the table 
forms. The comparison unit compares the feature of the structure of the ruled lines of the input image with the feature 
of the structure of the ruled lines stored in the dictionary unit. The extraction unit refers to the position information about 
25 the management information stored in the dictionary unit based on the comparison result from the comparison unit, 
and extracts the management information about the input image. The user entry unit enters the position of the man- 
agement information specified by the user in the dictionary unit. 

According to the third aspect of the present invention, the-storage unit stores image information as management 
information for an accumulated image. The retrieval unit retrieves the image information. 
30 According to the fourth aspect of the present invention, the storage unit stores ruled line information about a table 

form. The group generation unit obtains a plurality of possible combinations between the ruled line extracted from an 
input image and the ruled line contained in the ruled line information in the storage unit, and extracts a group containing 
two or more compatible combinations from the plurality of combinations in such a way that no combinations of another 
group can be contained. The comparison unit compares the input image with the table form according to the information 
3S about combinations contained in one or more extracted groups. 

Reference will now be made, by way of example, to the accompanying drawings in which: 

FIG. 1 A shows the configuration of the title extraction apparatus according to a filed application; 

FIG. 1 B shows a table-formatted document; 
40 fig. 2A shows the principle of the management information extraction apparatus; 

FIG. 2B shows the management information extracting process; 

FIG. 3 is the first flowchart showing the process performed when a form is learned; 

FIG. 4 is the first flowchart showing the process performed during the operation; 

FIG. 5 shows the configuration of the information processing apparatus; 
45 FIG. 6 is the second flowchart showing the process performed when a form is learned; 

FIG. 7 shows a ruled line structure extracting process; 

FIG. 8 shows a management information position specifying process; 

FIG. 9 shows the first ruled line feature of the rough classification; 

FIG. 1 0 shows the second ruled line feature of the rough classification; 
so FIG. 11 shows the third ruled line feature of the rough classification; 

FIG. 12 shows the fourth ruled line feature of the rough classification; 

FIG. 13 shows a method of extracting an intersection string; 

FIG. 14 shows an intersection string; 

FIG. 15 is a flowchart showing a cross ratio computation process; 
55 FIG. 1 6 shows the feature of the ruled lines indicating an outline using a cross ratio; 

FIG. 17 is the second flowchart showing the process performed during the operation; 
FIG. 18 shows a DP matching; 

FIG. 1 9 is a flowchart showing a DP matching process; 
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FIG. 20 is a flowchart (1 ) showing a management information position computing process; ^ 
FIG. 21 is a flowchart (2) showing a management information position computing process; 
FIG. 22 is a flowchart (3) showing a management information position computing process; 
FIG. 23 shows a process of extracting management information using a user entry mode and an automatic learninq 
mode; 

FIG. 24 is a flowchart showing an intra-table management information extracting process; 
FIG. 25 is a flowchart showing a management information extracting process for a document image without ruled 
lines; 

FIG. 26 is a flowchart showing a management information storage process^ 
FIG. 27 is a management information storage table; 
FIG. 28 is a flowchart showing a management information retrieving process; 
FIG. 29 is an association graph; 

FIG. 30 is a flowchart showing a form identifying process; 
FIG. 31 shows a reference width, a reference height, and a reference point; 
15 FIG. 32 shows a horizontal ruled line; 

FIG. 33 shows a vertical ruled line; 

FIG. 34 shows detailed information about the horizontal ruled lines; 
FIG. 35 shows detailed information about the vertical ruled lines; 
FIG. 36 is a flowchart showing a model matching process; 
20 FIG. 37 is a matching table; 

FIG. 38 shows a function of a threshold; 
FIG. 39 shows a case in which a sequence is inverted; 
FIG. 40 shows a case in which two corresponding ruled lines are assigned; 
FIG. 41 shows the correspondence of ruled lines represented by the optimum path set; 
FIG. 42 is a flowchart showing a node arranging process; 
FIG. 43 is a flowchart (1 ) showing a path generating process; 
FIG. 44 is a flowchart (2) showing a path generating process; 
FIG. 45 shows a node string of a storage unit; 
FIG. 46 shows a determining process using detailed information; 
FIG. 47 is a flowchart showing an optimum path set determining process; and 
FIG. 48 is a flowchart showing a node number updating process. 

The preferred embodiments of the present invention are described below in detail by referring to the attached 
drawings. 

FIG. 2A shows the principle of an image management system including the management information extraction 
apparatus and the image accumulation apparatus according to the present invention. This system includes the first, 
second, third, and fourth principles of the present invention and comprises a user entry unit 21 , a computation unit 22,' 
a dictionary unit 23, a comparison unit 24, an extraction unit 25, a storage unit 26, a group generation unit 27 and a 
retrieval unit 28. 

According to the first principle of the present invention, a computation unit 22 computes the position of the man- 
agement information contained in an input image based on the information about the position of a ruled line relative 
to the outline portion of the table area contained the input image. An extraction unit 25 extracts the management 
information from the input image based on the position computed by the computation unit 22. 

For example, as information about the outline portion of a table area, a reference size of a table area, or a position 
of a reference point close to the outline of the table area, is used. The computation unit 22 represents the position of 
each ruled line extracted from the table area as the information about the position relative to the reference point, and 
obtains the position of the management information from the position information of the ruled lines encompassing the 
management information. The extraction unit 25 extracts the image data corresponding to the position as management 
information and recognizes characters as necessary. 

The management information can be extracted with precision by obtaining the relative positions of ruled lines 
encompassing the management information for a plurality of reference points in the outline portion of a table or in a 
plurality of directions even if the state of an input image is inferior due to breaks, noise, etc. 

According to the second principle of the present invention, a dictionary unit 23 stores features of the structures of 
the ruled lines of one or more table forms, and position information of the management information in each of the table 
forms. A comparison unit 24 compares the feature of the structure of the ruled line of an input image with the feature 
of the structure of the ruled line stored in the dictionary unit 23. The extraction unit 25 refers to the position information 
about management information stored in the dictionary unit 23 based on the comparison result obtained from the 
comparison unit 24, and extracts the management information of the input image. A user entry unit 21 enters the 
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position* of the management information specified by the user in the dictionary unit 23. 

A table form refers to the layout structure of ruled lines forming the table. The dictionary unit 23 preliminarily stores 
the features of the structure of the ruled lines and the position of the management information specified by the user 
entry unit 21 . The comparison unit 24 obtains a table form having the features of the structure of the ruled lines similar 
s to those of the input image. The extraction unit 25 extracts the management information from the position specified in 
the table form. 

Thus, management information can be precisely extracted from each image even by preliminarily entering the 
position of user-requested management information and extracting the management information at the specified po- 
sition from an input image even if various form images are entered. 
10 According to the third principle of the present invention, a storage unit 26 stores image information as the man- 

agement information for an accumulated image. A retrieval unit 28 retrieves the image information. 

For example, in the electronic filing apparatus for accumulating a number of images, an image code extracted 
from each image is stored in the storage unit 26 as the management information. The retrieval unit 28 retrieves man- 
agement information by comparing a given image code with an image code in the storage unit 26 through, for example, 
is a template matching. 

Thus, the present invention not only stores/retrieves a character string of management information in character 
codes, but also stores/retrieves the character string as an image itself. Therefore, a character such as a textured 
character, a designed font, a logo, etc. which is hard to correctly recognize can be processed as management infor- 
mation. 

20 According to the fourth principle of the present invention, the storage unit 26 stores ruled line information about 

the table form. A group generation unit 27 obtains a plurality of possible combinations between ruled lines extracted 
from an input image and the ruled lines contained in the ruled line information in the storage unit 26, and extracts a 
group containing two or more combinations compatible to each other from among the plurality of combinations in a 
way that the extracted group may not contain a combination in another group. The comparison unit 24 compares the 

25 input image with the table form according to the information about the combination contained in one or more extracted 
groups. 

The group generation unit 27 obtains a possible combination of the ruled lines of an input image and the ruled 
lines of the table form to Identify the form of the input image using the table form stored in the storage unit 26. At this 
time, for example, ruled lines similar to each other in size and position relative to the entire table are retrieved as a 
30 possible combination. 

Then, it is determined whether or not two combinations are compatible by comparing the relation between the 
ruled lines contained in an input image with the relation between the ruled lines of the form in a table. At this time, the 
number of the objects to be compatibility-checked can be reduced and the process can be efficiently performed by 
generating a new group in a way that no combinations already contained in other groups can be included. 
3S The comparison unit 24 considers that a larger number of combinations contained in the optimum set of groups 

indicates a higher similarity between an input image and the table form, and determines the table form having the 
highest similarity as a form corresponding to the input image. 

Thus, the form of an input image can be rapidly identified, and a management information extracting process can 
be performed efficiently. 

40 For example, the user entry unit 21 shown in FIG. 2A corresponds to an input unit 43 shown in FIG. 5, which is 

explained later, and the dictionary unit 23 and the storage unit 26 correspond to an external storage unit 45 in FIG. 5. 
Furthermore, the computation unit 22, the comparison unit 24, the extraction unit 25, the group generation unit 27, and 
the retrieval unit 28 correspond to a central processing unit (CPU) 41 and memory 42 in FIG. 5. 

According to the present invention, the layout structure of the ruled lines in a well-known table format is learned 

45 for use in various applications. The learned information is used to extract a title, etc. with precision from an unknown 
table format. To attain this, a form learning mode and an operation mode are set. The layout structure may be hereinafter 
referred to as a format structure or a form. 

FIG. 2B shows the outline of the management information extracting process. The management information ex- 
traction apparatus first learns the layout of the ruled lines of documents A, B, etc. in known formats and the user- 

so specified position of a correct title area : etc. during the learning process. Then, a layout dictionary (form dictionary) 31 
including the above listed information is generated. 

The mode in which the user specifies the position of a title can be either a user entry mode without form recognition 
of documents A and B or an automatic learning mode with form recognition. The operations in each mode are described 
later. 

55 During the operation, the management information extraction apparatus extracts the layout of the ruled lines from 

an input unknown document 32, and matches the layout with the layout dictionary 31 . Thus, a document in a format 
matching the layout stored in the layout dictionary can be identified. In this example, the layout of the document 32 
matches that of the document A. 
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Then, the management information extraction apparatus refers to the information about the position of a title spec- 
ified by the corresponding document A, and extracts the title from a character string area 33 of the document 32 with 
high precision. Furthermore, management information about various documents can be extracted with high precision 
by instructing a user to specify not only a title but also other tag areas such as a date, etc. as management information 

Since management information should be quickly and automatically extracted when a user inputs a document 
using a scanner during the operation, a high-speed algorithm characterized by an interactive operation is -adopted in 
the present invention. In this algorithm, a classification process can be performed at a high speed by specifying can- 
didates for a corresponding form to the input document first in a rough classification, and then in a detailed classification 
(identification). A corresponding process is also performed during the form learning process. 

FIG. 3 is a flowchart showing the outline of the process in a form learning mode. When the process starts, the 
management information extraction apparatus first inputs a document image to be learned (step SI) and extracts the 
structure of the ruled lines (step S2). Then, the management information extraction apparatus inquires the user of the 
position of the management information and instructs the user to specify the position (step S3). 

Then, the management information extraction apparatus extracts the features of the ruled lines for the rough clas- 
sification by discriminating solid lines from broken lines in the extracted structure of the ruled lines (step S4), and 
extracts the features of the ruled lines indicating an outline (a contour) for detailed identification (step S5). For example 
the features of the structure of the ruled lines stable against a change in data are used as the features for the rough 
classification. As the features for detailed identification, a cross ratio relating to the outline of a table is used in consid- 
eration of a high-speed process. 

Then, the management information extraction apparatus stores the extracted features of the ruled lines and the 
specified position of the management information in the layout dictionary 31 (step S6), and terminates the process 
The stored information is referenced in an operation mode, and is used to extract the management information from 
an unknown document. 

FIG. 4 is a flowchart showing the outline of the process in an operation mode. When the process starts, the man- 
agement information extraction apparatus first inputs a document image to be processed (step S11) and extracts the 
ruled line structure (step S1 2). 

Then, the management information extraction apparatus extracts the features of the ruled lines for the rough clas- 
sification from the ruled line structure (step S13), compares them with the corresponding information in the layout 
dictionary 31, and performs the rough classification of the ruled line structure (step S14). As a result, the ruled line 
structure in the layout dictionary 31 which possibly matches the ruled line structure of the layout dictionary 31 is ex- 
tracted as a candidate. 

Then, the management information extraction apparatus extracts the features of the ruled lines indicating an outline 
for detailed identification from the ruled line structure (step S15), compares them with the corresponding information 
about the candidate extracted in the rough classification, and identifies the details of the ruled line structure (step S16) 
In this step, for example, a one-dimensional matching process is performed on the cross ratio to specify a candidate 
corresponding to an input document. 

Then, it computes the position of the management information in the input document image based on the position 
of the management information specified in the form of the candidate (step S17), and then terminates the process 
Thus, according to the position information specified by the user in the known document, management information 
can be extracted from the input document image with high precision. Since the form comparing process is performed 
in two steps of rough classification and detailed identification during the operation, candidates for detailed identification 
are limited, thereby speeding up the extracting process. 

The management information extraction apparatus according to the present embodiment can be realized by an 
information processing device (computer) as shown in FIG. 5. The information processing device shown in FIG 5 
comprises the CPU 41, the memory 42, the input unit 43, an output unit 44, the external storage unit 45, a medium 
drive unit 46, a network connection unit 47, and an opto-electrical conversion unit 48, and each of the units are inter- 
connected through a bus 49. 

The CPU 41 executes a program using the memory 42, and performs each process shown in FIGs. 3 and 4 The 
memory 42 can be a read only memory (ROM), a random access memory (RAM), etc. Necessary data such as the 
layout dictionary 31 , etc. is temporarily stored in the RAM. 

The input unit 43 can be, for example, a keyboard, a pointing device, etc. and is used when a user inputs a request 
or an instruction. The output unit 44 can be, for example, a display device, a printer, etc. and is used when an inquiry 
is issued to a user or when a process result, etc. is output. 

The external storage unit 45 can be, for example, a magnetic disk device, an optical disc device, a magnetooptical 
disk device, etc., and stores a program and data. It also can be used as a database for storing images and the layout 
dictionary 31. 7 

The medium drive unit 46 drives a portable storage medium 50 and accesses the contents stored therein The 
portable storage medium 50 can be an arbitrary computer-readable storage medium such as a memory card, a floppy 
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disk : a'compaat disk read only memory CD-ROM, an optical disk, a magneto-optical disk, etc. The portable storage 
medium 50 stores not only data but a program for performing each of the above listed processes. 

The network connection unit 47 is connected to an arbitrary communications network such as a local area network 
(LAN), etc. and performs data conversion, etc. associated with communications. The management information extrac- 
5 tion apparatus can receive necessary data and programs from an external database, etc. through the network con- 
nection unit 47. The opto-electrical conversion unit 48 can be, for example, an image scanner and receives an image 
of a document, a drawing, etc. to be processed. 

Next, each of the processes performed during the form learning process is described by referring to FIGs. 6 through 

16. 

10 FIG. 6 is a flowchart showing the details of the process performed during the form learning process. In FIG. 6, the 

process steps corresponding to those in FIG. 3 are assigned identical numbers. In the ruled line extracting process in 
step S2, the management information extraction apparatus extracts vertical and horizontal broken lines (step S2-1 ) 
and vertical and horizontal solid lines (step S2-2) from an input document image as shown in FIG. 7, and then extracts 
a rectangular cell (rectangular area) encompassed by the vertical and horizontal ruled lines (step S2-3). 

is When a ruled line and a rectangular cell are extracted, technologies such as the image extraction apparatus (Jap- 

anese Patent laid-open H7-28937), the character-box extraction apparatus and the rectangle extraction apparatus 
(Japanese Patent Application H7-203259), etc. disclosed by the Applicant of the present invention are used. According 
to these technologies, a character box can be extracted or removed from the image without entering information about 
the position, etc. of the ruled lines in a slip. Described below is the outline of the ruled line structure extracting process. 

20 

(1 ) Thinning process: to thin vertical and horizontal lines in a masking process to remove the difference in thickness 
between characters and boxes. 

(2) Segment extracting process: to extract a relatively long segment using an adjacent projection. The adjacent 
projection refers to a method of defining a sum of a projection value of a picture element contained in an object 

2S row or column and projection values of surrounding rows or columns, as a final projection value of the object row 

or column. According to the projection method, the distribution of the picture elements surrounding a specific row 
or column can be recognized from a global view point. 

(3) Straight line extracting process: to sequentially search for extracted segments and check whether or not there 
is a discontinuity of a distance equal to or longer than a predetermined distance between segments. Segments 

30 having no such discontinuity are sequentially integrated to extract a long straight line. 

(4) Straight line integrating process: to reintegrate extracted lines. Two or more line portions divided by a break 
are re-integrated into a straight line. 

(5) Straight line extending process: A straight line shortened by a break is extended and restored into an original 
length only when the document is written as a regular slip. 

3S (6) Determining horizontal lines forming part of a box: According to the rules indicated by 'Character Box Extraction 

Apparatus and Rectangle Extraction Apparatus' (Japanese Patent Application H7-203259) t a pair of horizontal 
straight lines forming a row of entry boxes are extracted in two-line units as horizontal lines forming part of a 
character box frame sequentially from an upper portion of a table. 

(7) Determining vertical lines forming part of a box: Vertical lines forming part of a character box frame are deter- 
40 mined for each row of the above described entry boxes. A vertical line both ends of which reach the two horizontal 

lines forming part of the object row is defined as a vertical line forming part of the row. 

(8) Rectangular cell extracting process: A rectangular cell encompassed by two horizontal lines and two vertical 
lines forming a box is extracted as a character area. 

45 Then, in the management information position specifying process in step S3, the management information extrac- 

tion apparatus displays an input document image on the screen of the display unit, and instructs a user to point to any 
point in the character string indicating a title using a mouse as shown in FIG. 8. Then, it stores the position information 
of the rectangular cell 51 containing the pointed position. 

The position information about a rectangular cell 51 is defined based on an arbitrary intersection on contour of a 

so table, and corresponds to the information about the vector from the intersection to the position of the rectangular cell 
51. For example, if an upper left vertex 52, a lower left vertex 53 : an upper right vertex 54, and a lower right vertex 55 
are start points of a vector, then the data of difference vectors A, B, C, and D from each vertex respectively to an upper 
left vertex 56, a lower left vertex 57, an upper right vertex 58, and a lower right vertex 59 is stored. Simultaneously, 
the height hO and the width wO of a table, and the height H1 and the width W1 of a rectangular cell are stored. 

55 in the rough classification ruled line feature extracting process in step S4, the management information extraction 

apparatus first counts the intersections of the horizontal and vertical ruled lines (step S4-1). Then, the crossing state 
of each intersection is extracted to obtain the frequency distribution (step S4-2). The crossing state is represented by 
a code (K1 , K2, K3, and K4) indicating the existence of a vertical or horizontal ruled line extending from the intersection, 
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and the type of the ruled line. 

Element K1 refers to a ruled line above an intersection. Element K2 refers to a ruled line below an intersection. 
Element K3 refers to a ruled line at the left of an intersection. Element K4 refers to a ruled line at the right of an 
intersection. The value of each element is 0 when no ruled lines exist, 1 when a solid line exists, or 2 when a broken 
line exists. 

For example, the crossing state of the intersection shown in FIG. 9 is represented by (1,1 ,1,1). The crossing state 
of the intersection shown in FIG. 10 is represented by (1,1,1,0). The crossing state of the intersection shown in FIG. 
11 is represented by (0,2,2 ; 2). The crossing state of the intersection shown in FIG. 12 is represented by (1 ,1,2,2). Since 
each element of (K1, K2, K3. K4) can be assigned any of three values, the number of possible codes is 3* '(= 81). In 
step S4-2, an occurrence number (frequency) is obtained and stored for each code of 81 types. 

Next, the width-to-height ratio of each rectangular cell is computed, and the frequency distribution is computed as 
that of a rectangular cell (step S4-3). When the height of a rectangular cell is HI and its width is W1 , the width-to-height 
ratio can be represented by W1 /H 1 . The frequency distribution of the width-to-height ratio can be obtained by increasing 
the value of W1/H1 by 0.5 in succession starting from 0, and counting the rectangular cells having the width-to-height 
ratio corresponding to each value. At this time, rectangular cells exceeding a threshold (for example, 1 0) are collectively 
counted. 

In the detailed identification outline ruled line feature extracting process in step S5, the management information 
extraction apparatus first retrieves an intersection string comprising four intersections from outside in the horizontal 
and vertical directions in each row or column containing intersections in series. 

For example, in the case of the ruled line structure shown in FIG. 1 3, intersections 61 , 62, 63, and 64 are retrieved 
when four intersections are retrieved sequentially from the left end in the second row. Intersections 65, 64, 63, and 62 
are retrieved when four intersections are retrieved sequentially from the right end in that row. Intersections 66,' 63, 67, 
and 68 are retrieved when four intersections are retrieved sequentially from the top in the third column. Intersections 
70, 69, 68, and 67 are retrieved when four intersections are retrieved sequentially from the bottom in that column. 

The cross ratio of the one-dimensional projective invariants relating to the retrieved intersection string is computed. 
For example, if an intersection string comprising four intersections X1, X2, X3, and X4 is retrieved as shown in FIG. 
14, the cross ratio is expressed as follows. 



CROSS XATXO - |g : g| |g : «| (1) 

where I Xi - Xj I indicates the width (distance) between intersections Xi and Xj (i, j = 1 , 2, 3, or 4). The cross ratio 
of equation (1 ) is computed according to, for example, the flowchart shown in FIG. 1 5. When the cross ratio computing 
process is started ; the management information extraction apparatus inputs the coordinate data of the four intersections 
X1 , X2, X3, and X4 (step S21 ). 

Then, the distance between intersections X1 and X2 is computed and input to variable a (step S22), the distance 
between intersections X3 and X4 is computed and input to variable 6 (step S23) : the distance between intersections 
X1 and X3 is computed and input to variable c (step S24), and the distance between intersections X2 and X4 is comp uted 
and input to variable d (step S25). Next, ab/cd is computed and the result is stored as a cross ratio (step S26), and 
then, the process is terminated. 

Thus, the features of a sequence of intersections around the outline of a table can be quantified by computing the 
cross ratio of all intersection strings. As a result, the two dimensional features of the outline of the table is represented 
by a sequence of one-dimensional values as shown in FIG. 16. The sequence of values of a cross ratio is hereinafter 
referred to as a cross ratio string. 

In FIG. 16, the right cross ratio string R[1], Ft[2], R[3] R[n] corresponds to the cross ratio indicating the feature 
of tiie rightmost portion of each row. The left cross ratio string l_[1], L[2], L[3], .... L[m] corresponds to the cross ratio 
indicating the feature of the leftmost portion of each row. The upper cross ratio string U[1 J, U[2], U[3] U[w] corre- 
sponds to the cross ratio indicating the feature of the top portion of each row. The lower cross ratio string D[1], D[2], 
D l 3 ] D M corresponds to the cross ratio indicating the feature of the bottom portion of each row. 

Normally, since the ruled line structure is not symmetrical at the leftmost and rightmost portions of a table, or there 
may be a break or distortion in a line in a part of an image, n does not always match m. Similarly, w does not necessarily 
match v. 

By integrating these cross ratio strings in the four directions into a single string, a feature vector (R[1], .... R[n], L 
[1], L[m], U[1], .... U[w], D[1], .... D[v] ) having the values of respective cross ratios as elements can be generated. 

In this example, the ratios of the distances among four intersections are used as the features of the ruled lines 
indicating the outline for detailed identification. Instead, the ratios of the distances among any number (at least two) 
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of intersections can be used. Also in this case, the feature of the outline can be represented by arranging the ratios in 
a one-dimensional array. 

In the process in step S6, the management information extraction apparatus stores in the layout dictionary 31 the 
position of the management information specified in step S3 and the feature of the ruled lines obtained in steps S4 
s and S5 as the identification information (form information) about a table-formatted document. 

Each process performed during the operation is described below by referring to FIGs. 17 through 22. 

FIG. 17 is a flowchart showing the details of the process performed in learning a form. In FIG. 17, the process step 
corresponding to the step shown in FIG. 4 is assigned the same identification number. First, in the ruled line structure 
extracting process in step S12, the management information extraction apparatus extracts a vertical and horizontal 
10 broken line (step S12-1), a vertical and horizontal solid line (step S12-2), and a rectangular cell encompassed by the 
vertical and horizontal ruled lines (step S12-3) from an input document image as in the process in step S2 performed 
in learning a form. 

In the rough classification ruled line feature extracting process in step S1 3, the management information extraction 
apparatus counts the intersections between horizontal and vertical ruled lines (step S13-1), obtains the frequency 

is distribution of the crossing state of each intersection (step S1 3-2), and computes the frequency distribution of the width- 
to-height ratio of each rectangular cell as in the process in step S4 in learning a form. 

In the rough classification process in step S14, the management information extraction apparatus compares the 
obtained data with the form information about a number of tables in the layout dictionary 31 using the number of 
intersections, the frequency distribution of crossing states, and the frequency distribution of the width-to-height ratios 

20 of rectangular cells in order to limit the number of candidates for a corresponding table. In this example, appropriate 
predetermined thresholds are set for respective features of the number of intersections, the frequency of crossing 
states, and the frequency of width-to-height ratios of rectangular cells in consideration of a break or distortion in lines 
of an image. If the form information of the layout dictionary 31 matches the information about the input image within a 
predetermined allowance, it is defined as a candidate for the table. 

25 For example, assuming that the number of intersections of an input document image is Ki and the number ot 

intersections of a form t stored in the layout dictionary 31 is Kt. the form t is defined as a candidate if the absolute value 
I Ki - Kt I of the difference between the values is within the threshold THk. Thus, if the differences between the elements 
of the input element and the form information in the layout dictionary 31 are all within respective thresholds, then the 
form is determined as a candidate for the form corresponding to the input document. 

30 Since the features of the number of intersections, crossing states, the frequency distribution of the sizes of rec- 

tangular cells, etc. are normally stable against the fluctuation of image data, they can be used to precisely compare 
data with a document image indicating a break or distortion in its lines. 

In the detailed identification outline ruled line feature extracting process in step S15, the management information 
extraction apparatus computes the cross ratio of the one-dimensional projective invariants from four directions as in 

55 the process in step S5 performed in learning a form. 

In the detailed identification process in step S16, the management information extraction apparatus compares 
cross ratio strings only for the candidates for a table according to the rough classification. In this process, the cross 
ratio strings are associated between the input form and the learned form individually in the four directions. Since the 
structure of the object form is a table, the sequence of the ruled lines is not inverted between rows or columns. Therefore; 

40 a dynamic programming (DP) matching is performed only with the partial loss of a ruled line due to a break or distortion 
taken into account. 

A DP matching is well-known as a method of matching time-series data such as voice, etc. which is described in 
detail by, for example, "Pattern Recognition", p. 62 - p.67 by Noboru Funakubo, published by Kyoritsu Publications. In 
this method, similarity is assigned to a local feature of data and an evaluation function indicating the acceptability of 
4S the entire correspondence is defined using the assigned similarity when two data sets are compared. The correspond- 
ence of data is determined to obtain the highest value of the evaluation function. 

FIG. 18 shows the comparing process of the right cross ratio string using the DP matching. In FIG. 18, the right 
cross ratio string R[1] t R[2], R[3J, R[n] of the input form corresponds to the right cross ratio string R'[1], R'[2], R' 
[3], .... R'[n'] of the learned form in the layout dictionary 31 . 
so in this comparing process, the reliability of a ruled line is-taken into account and the weight value of the corre- 

spondence for an evaluation function is different between the cross ratio of an intersection string obtained from a 
reliable ruled line and the cross ratio obtained from other ruled lines. For example, the similarity of the cross ratio 
obtained from a reliable ruled line is assigned a higher weight value. 

FIG. 1 9 is a flowchart showing an example of the comparing process for the right cross ratio string using the DP 
ss matching. When the process starts, the management information extraction apparatus first stores the right cross ratio 
string of the input form in the array F*[i] (i = 1, ... t n), and stores the right cross ratio string of the learned form in the 
array R'[k] (k= 1, .... n')(step S31). 

Then, the error array E[i, k] is initialized (step S32), and a computation is performed by the following recurrence 
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equation on i = 1 n, k = 1 n' (step S33). 



E[i,k] = mini E[i-l,k] +d[i,k] . 

where E[i, kj indicates the minimum value of error accumulation when a part of the cross ratio strinq (R11 R 

18 asso( ;« t9d **" ("m R'M )• Therefore, when the accumulation error during the computing operatbn is used 

as an evaluatron function, E[i, k] provides its minimum value. d[i, k] indicates an error when R[i] is associated with R" 
[k], and computed, for example, by the following equation. 



d[i,k] = \R[i]-Fr[k]\ 



(3) 



where X .ndicates a weight value for d[i. k], and min{} indicates the minimum value among the elements in the {} 
Next, the path of E [n, n ], which includes correspondence relations of cross ratios used to determine the value of 
Ml " w„n mP H U, ,n-m eP ^KV™' the reSUlt iS S, ° red 38 the co " e spond e nce between the cross ratio strings (R 
I ITiS \ ] 1 '.r L" ] (St6P S35) ' and thS pfOCess ,ermin ates. Thus, the correspondence between cross 

ratios is determined to obtain the minimum value of the evaluation function. The comparing processes on the left top 
ana bottom cross ratio strings are performed similarly. 

In step 316. such a one-dimensional DP matching is performed on all learned forms obtained by the rough clas- 
s.f cation, and the form ,nd,cat.ng the minimum (best) evaluation function is determined to be the form corresponding 
the fL"tnr °7L *T detailed ,dertiflca t'°n, a high-speed process can be performed by the identification using 

the features of the outline (contour) of a table structure through the one-dimensional matching 
.J"* 8 mana 9 em i f n « information position computing process in step S17, the management information extraction 
apparatus refers to the layout dictionary 31, retrieves the position information about the learned form specified in the 
fn^rlttion ' ****** mana9ement "formation *wn the input image according to the retrievedposition 

^ImII'm 3 P T SS ' Tlf^" 19 ' eVel iS CheCk8d at the intereecti0 " ( e "d Point) at both ends of each row and each 
wh«Z" no ^ feS H 0ft ^ COrre ^ ondenceo, * ecro ^ ratiostri n9intheabovedescribedDPmatchingtodet e rmine 
whether ornot the endpointsare stable. A matching leve.at an end point refers tothe probability of the correspondence 
between the cross ratio of an input form and the cross ratio of a learned form 

the ^nHnlf7l T 1 R ' 111 ""'^^ (° ne - to -° ne ) correspond to each other in FIG. 18, it is determined that 
the nght end point of the f,rst row .s stable. Since R[3] and R'[4] also correspond one-to-one to each other, the right 
end point of the corresponding row is stable. However, since R[2] corresponds to both R[2] and R'[3] and does not 
^trZT J° °' th6m> " iS determined that 'he right end point of the corresponding ow is not stable 
Thus, me stable end point for each of the upper left, lower left, upper right, and lower right vertex is obtained and 
defined as a stable point on the outline. 

ct^ 6 *;,* 6 hei£ f 1 h ° J and the Widlh ^ ° f the ,ables °' the '"P" 1 forrn and the leamed form are obtained based on 
n ,nL TT 9 P f °.T I V. COmpared witn other to obtain the relative ratios between the heights and the widths 
ITl * ?* m iPpUt ' 0rm Then ' ,he pOSiti ° n ° f ^ management information is computed 

based on the drfference vectors A, B. C, and D shown in FIG. 8, and the height H1 and the width W1 of the rectangular 

to thlllf T described ratio indicates ei1her ^ enlargement ratio or a reduction ratio of the table of an input form 
to the table of a learned form, and is used to normalize the fluctuation between the tables 

F °I SXa « P ' e> When 1he fatiOS ° f ,he heignt and ** e widtn of * e i n P ut ,orm «o those of the table shown in FIG 8 
oll^l of d |l erenCe ^f!.* B i u ° are mU " ip,ied by * Then ' in tne ,able of tne ln P ut f °™. 'he approximate 
the Inr^im!, PPef J ° f ^ rec,an 9 u,ar oel1 con1ai ™9 »» management information is obtained. Similarly 

he ap P rox,mate posit.ons of the upper right, lower left, and lower right vertexes of the rectangular cell can be obtained 

unL«r JnhtT ? f "I"* ™ nip * in * tne dHference ™«™ * C and D by a, with the stable outline points at the 
upper right, lower left, and lower right vertexes as starting points. 

Next a rectangular cell which is located near the obtained positions and is nearly equal to H1*a and W1*a re- 

1" f? Wjdlh iS SearChed f ° r Th6n ' the *** in the recta n 9 ular eel, such as a character string, etc is 

extracted as requested management information. 

»o «£ S 2 1 21 " 22 flowcnarts showin 9 ™ sample of the management information position computing proc- 
ess. When the process starts, the management information extraction apparatus first inputs the result of associating 
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the cross ratio*strings in the four directions during the DP matching (step S41). 

In this process, the results ot associating the right cross ratio string (R[1], R[n] ) with (R'[1], R'[n']), the left 

cross ratio string (L[1], .... L[m]) with (L'[1] f L'[m']), the upper cross ratio string (U[1], .... U[w] ) with (U'[1] U'fw 1 ]), 

and the lower cross ratio string (D[1], .... D[v]) with (D*[1 ], D'[v'] ) are input. 
5 Next, stable end points of the input form are computed from the data, and are defined as candidates for stable 

outline points (step S42). The cross ratios corresponding to the candidates are respectively expressed as R[nmin] : R 

[nmax], L[mmin], L[mmax], U[wmin], U[wmax], D[vmin], and D[vmax]. 

'nmin' indicates the row number of the uppermost point corresponding to the minimum y coordinate value of all 

stable rightmost points in the table, 'nmax' indicates the row number of the lowermost point corresponding to the max- 
10 imum y coordinate value of all stable rightmost points in the table, 'mmin' indicates the row number of the uppermost 

point of all stable leftmost points in the table, 'mmax* indicates the row number of the lowermost point of all stable 

leftmost points in the table. 

'wmin' indicates the column number of the leftmost point corresponding to the minimum x coordinate value of all 
stablB uppermost points in the table. *wmax' indicates the column number of the rightmost point corresponding to the 
is maximum x coordinate value of all stable uppermost points in the table. Vmin 1 indicates the column number of the 
leftmost point of all stable lowermost points in the table. Vmax' indicates the column number of the rightmost point of 
all stable lowermost points in the table. 

Then, the position of the stable outline points are computed according to the data of obtained candidates (step 
S43). The maximum and minimum values of the x and y coordinates of each candidate are obtained and the values 
20 are used as coordinate elements of stable outline points. 

In FIG. 20, for example, XMIN {Rfnmin], Rfnmax], L[mmin], L[mmax], U[wmin], U[wmax], D[vmin], and D[vmax]} 
indicates the minimum value of the x coordinate of the end point corresponding to the value of each cross ratio in {}. 
Similarly, XMAX {} indicates the maximum value of the x coordinate of each end point, YMIN 0 indicates the minimum 
value of the y coordinate of each end point, and YMAX {} indicates the maximum value of the y coordinate of each end 
2S point. 

These values XMIN {}, XMAX {}, YMIN {}, and YMAX {} are respectively represented by XMIN, XMAX, YMIN, and 
YMAX for simplicity. At this time, the coordinates of the stable outline points at the upper left, upper right, lower left, 
and lower right portions are respectively represented by (XMIN, YMIN), (XMAX, YMIN), (XMIN, YMAX), and (XMAX 
YMAX). 

so Then, the stable end points of the dictionary form, that is, a learned form, are computed and defined as candidates 

for stable outline points (step S44 in FIG. 21). The cross ratios corresponding to the candidates are respectively rep- 
resented by R' [nmin'], R' [nmax 1 ], V [mmin'], U [mmax 1 ], U' [wmin 1 ], U' [wmax 1 ], D' [vmin'], and D 1 [vmax 1 ]. 

The meanings of nmin', nmax', mmin', mmax', wmin', wmax', vmin', and vmax' are the same as the meanings of 
the above described nmin, nmax, mmin. mmax, wmin, wmax, vmin, and vmax. 

35 Using the obtained data of the candidates, the positions of the stable outline points of the dictionary form are 

computed as in step S43 (step S45). In FIG. 21, the meanings of XMIN' {}, XMAX 1 {}, YMIN' {}, and YMAX {} are the 
same as those of the above described XMIN {}, XMAX {}, YMIN {}, and YMAX (}. 

These values XMIN' {), XMAX* {}, YMIN' {), and YMAX 1 {} are respectively represented by XMIN", XMAX", YMIN', 
and YMAX 1 for simplicity. At this time, the coordinates of the stable outline points at the upper left, upper right, lower 

40 left, and lower right portions are respectively represented by (XMIN', YMIN'), (XMAX", YMIN'), (XMIN', YMAX*), and 
(XMAX 1 , YMAX). 

According to the coordinate information about the stable outline points obtained in step S43, the height hO and the 
width wO of the input form are computed by the following equations (step S46 in FIG. 22). 

45 w0 = XMAX -XMIN (4) 



h0 = YMAX -YMIN (5) 

so 

According to the coordinate information about the stable outline points obtained in step S45, the height hO' and 
the width wO* of the dictionary form are computed by the following equations (step S47). 

ss w0' = XMAX 1 -XMIN' (6) 



h0' = YMAX' - YMIN' (7) 
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o, mo 8 '! 1 , 9 and h0 ' and widths wo and wO', the ratios Sw and Sh (enlargement ratio or reduction ralio) 

of the size of the input form to the size of the dictionary form are computed (step S48). 



SW = WO/WO' ^gj 



Sh = hO/hO' ( 9 ) 



The size of the element of the difference vector having a stable outline point of a table of a dictionary form as a 
starting pom t,s obtained as a relative coordinate value indicating the position of management information (step S49) 
n this case, the difference vector from a plurality of outline points near each vertex in the outline points corresponding 

SJ^T* '^• ,0B "t 111 ' R " ^' , * L * 111 L ' [m,] ' U ' m u 'MandD-[1] D'[V] is assumed to be preliminary 

stored as position information in the dictionary 31 . 

The relative coordinate values from the upper left, upper right, lower left, and lower right stable points are respec- 
tively set as (fxmim , fyminl ), (fxmaxl , fymin2), (fxmin2, fymaxl ), and ( fxmax2 fymax2) 

Then, based on the relative coordinate values and the ratios Sw and Sh of the size of the input form to the size of 
? ^l*?™' fOU9h estimation of the P° sitio " ° f *° management information in the input form is performed 
(step S50). In this process, four points having the following coordinate values are obtained as candidates forthe position 
of the management information. 

(XMIN+Sw* fxmim, YMIN + Sh * fyminl) 
(XMAX - Sw * fxmaxl , YMIN + Sh * fymin2) 
(XMIN + Sw * fxmin2, YMAX - Sh * fymaxl) 
(XMAX - Sw * fxmax2, YMAX - Sh * fymax2) 

h • u. 6 *^ 8 reC, M " 9,Jlar ° e " °' an input f ° rm containin 9 positions of these candidates is extracted (step S51 ) If Ihe 
height of the cell » nearly Sh times the height H1 of the rectangular cell specified in the dictionary form and the width 
of the cell » near y Sw times the width W1 of the rectangular cell specified in the dictionary form, then it is determined 
that the rectangular cell contains management information. 

^o!^' ^l™ 90 data ° f a string, etc. in the rectangular cell is output as management information (step 

S52), thereby terminating the process. Thus, the management information is extracted from an input image acco ding 
to the result of detailed identification. 

In this example the dictionary 31 stores difference vectors with a part of a plurality of outline points corresponding 
to the cross ratios of the dictionary form as starting points. However, difference vectors from all outline points can be 
pre immarily stored to select not only the outline points near the vertexes of the table but also optional outline points 
on the perimeter as stable outline points. 

It is not always required to extract four stable outline points. That is, based on any one stable outline point as a 
reference point, the posrtion of management information can be obtained using the relative coordinate values from the 
position of the reference point to quickly perform the process. In general, the number of stable outline points forthe 
process is specified arbitrarily. r 

In step S51, a rectangular cell containing four candidate positions is extracted. However, a rectangular cell con- 
ta.ning one or more candidate positions can be extracted, or a rectangular cell whose distance from one or more 
candidate positions is within a predetermined value can be extracted. 

In the above described management information extracting process, the form of an input document and the position 
of management information can be automatically learned and stored in the layout dictionary 31. According to the in- 

Z ^/h'T l ab ' e - ,0rmatted d °™™nts ™ be processed and the position of the management information can 
oe computed with high precision. 

m noTT!? be ' OW in dStail iS ,he meth ° d ° f s P ecifvin 9 ,he P° s "'°n °f me management information in step S3 shown 
in FIG. 6. In the present embodiment, the method of specifying the position of management information by a user can 
be followed in erther a user entry mode in which the user is instructed to explicitly specBy the position or an automatic 
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learning mode in which a candidate for the management information is automatically extracted. 

In the user entry mode, the management information extraction apparatus instructs the user to directly specify the 
position of management information from among a number of rectangular cells forming a table as shown in FIG. 8. For 
example, if there are a large number of documents having the same form of design drawings, etc. and the position of 
s the management information is specified on the first document, then only the position information should be read from 
the second and the subsequent ones, thereby realizing a batch input using an automatic document feeder. 

In the automatic learning mode, a plurality of areas which are candidates for an area containing management 
information are extracted using the title extracting technology described in the former application 08/694,503, the po- 
sition of an area selected by the user from among the plurality of areas is automatically learned, and the position is 
10 defined as the first candidate in the subsequent operations, if the user does not select any of the candidates, but 
optionally specifies a new position, then information of that position is automatically input in the user's interactive 
operation. 

Otherwise, the title extracting technology disclosed by the former application can be applied to the user entry mode 
to select management information from among a plurality of candidates. In this case, a form is recognized or identified 
is jn the process shown in FIG. 4 in the automatic learning mode to check whether or not an input image matches the 
form in the dictionary 31. If the input image matches any of the forms in the dictionary 31, its position information is 
retrieved and presented to the user. Unless the input image matches any of the forms in the dictionary 31 , a candidate 
for the management information is extracted through the title extracting technology of the former application. 

FIG. 23 shows the management information extracting process with the above described two modes, tn the user 
20 entry mode shown in FIG. 23, the management information extraction apparatus first extracts a plurality of candidates 
for management information from an input image 71 of a table-formatted document in the intra-table title extracting 
process based on the former application. 

FIG. 24 is a flowchart showing the intra-table management information extracting process. When the process 
starts, the management information extraction apparatus reads a document 71 , and stores it as a document image in 
25 the memory (step S61 ). In this example, the original image is stored after being converted into a compressed image. 

Next, the document image is labelled, large rectangles are extracted based on the highest frequency value for the 
height of a rectangle (step S62), rectangles encompassing a table (table rectangles) are extracted from the extracted 
large rectangles (step S63), and a rectangle containing management information is selected from the table rectangles 
(step S64). In this example, for example, a table rectangle occupying the largest area is selected. 
30 Then, a character string is extracted from the selected table rectangle, a rectangle circumscribing a character 

string (character string rectangle) is obtained, and its coordinates are stored in the memory (step S65). Next, a rectangle 
having a short width or a rectangle having a height longer than its width is removed from the stored character string 
rectangles as a noise rectangle (step S66), and two or more character string rectangles are integrated into one rectangle 
(step S67). 

3S The character string rectangles extracted from the table are obtained in the above described processes. These 

character string rectangles may contain a part of the ruled lines of the table. Therefore, the ruled line portions are 
extracted from inside the character string rectangles, and the portions are used as the boundary for dividing character 
string rectangles (step S68). 

Next, the number of characters in a character string rectangle is counted to extract a character string rectangle 
40 corresponding to management information (step S69). The obtained number of characters is used in the process in 
step S72 as an attribute of the character string rectangle. 

In the process in step S68, a character string rectangle is extracted for each box encompassed by the ruled lines 
of a table. If the outline of the original table is not rectangular, a character string rectangle outside the table may exist. 
Therefore, if a character string rectangle has no upper ruled line of a table when an upper ruled line is searched for,' 
4S then it is regarded as the character string rectangle outside the table and is removed (step S70). 

Then, the character string rectangles in the tableare rearranged in order from the one closest to the coordinate at 
the upper left comer (step S71 ). When the number of characters in the character string rectangle satisfies a predeter- 
mined condition, then the character string rectangle is extracted as management information (step S72), thereby ter- 
minating the process. If there are a plurality of character string rectangles satisfying the condition, then they are de- 
50 termined to be candidates for the management information in order from the one closest to the upper left comer of the 
table rectangle. 

In this example, three candidates CI, C2, and C3 for management information are extracted in an image 77, and 
a user interface 78 of the management information extraction apparatus outputs them in order from the highest priority 
to present them to the user. The user selects one of them by pointing to it using a mouse when an appropriate candidate 
55 js presented as management information. Unless an appropriate candidate is presented, the user can correct a can- 
didate for management information by explicitly specifying another rectangular cell by pointing to ft using a mouse. 

The management information extraction apparatus learns the position of the user-selected/corrected management 
information, and stores the position information and ruled line structure in the dictionary 31 as a user dictionary 73. 
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Thus, the management information extraction apparatus can use the position information directly speeded by the user 
in the subsequent processes. 

In the automatic learning mode shown in FIG. 23, the management information extraction apparatus first refers to 
a plurality of user dictionaries 73 and recognizes the forms of input images 71 , 72, etc. 

If the table-formatted input image 71 is input and it is determined that it matches the form of any of the user 
dictionaries 73 as a result of reference in the rough classification and detailed identification, then management infor- 
mation C1 at the position specified in a resultant form 74 is output and presented to the user. If the user accepts the 
management information C1 , the information is adopted as is. Unless the user accepts it, the user is instructed to select 
appropriate information from among other position information C2, C3, etc. 

Unless the input image 71 matches the any form in the user dictionary 73, the above described intra^able man- 
agement information extracting process is performed and the candidates C1, C2, C3 ; etc. for the management infor- 
mation are extracted from a resultant image 75. The user interface 78 presents these candidates to the user in order 
from the highest priority, and the user selects an appropriate candidate as management information from among the 
presented candidates. Unless an appropriate candidate is presented, the candidates for management information can 
be corrected by explicitly specifying another rectangular cell. 

The management information extraction apparatus learns the position of the user-selected/corrected management 
information in the input image 71 , and stores the position information and the ruled line structure as the user dictionary 
73 in the dictionary 31 for use in the subsequent processes. 

If a normal non-table document image 72 is input, then it is determined as a result of recognizing the form that 
there are no ruled lines. Then, a plurality of candidates for management information are extracted in the title extracting 
process from a document image without ruled lines according to the former application. 

FIG. 25 is a flowchart showing this management information extracting process. When the process starts, the 
management information extraction apparatus reads the document 72 and stores it as a document image in the memory 
(step S81 ). In this process, the original image is stored after being converted into a compressed image. 

Next, the document image is labelled, a character string is extracted as a result of the labelling process, and the 
coordinate of the character string rectangle is stored in the memory (step S82). Then, a rectangle having a short width 
or having a width shorter than its height is removed as a noise rectangle from the stored character string rectangles 
(step S83), and additionally a rectangle which does not seem to be a character string is removed. Then, a document 
area is determined (step S84). 

30 The remaining character string rectangles are rearranged in the vertical direction (in the y^oordinate directions) 

(step S85). A rectangle containing an image of a character box (character box rectangle) is extracted, and then a 
character string rectangle in the character box rectangle is marked as a rectangle with a character box (step SB6). 
Furthermore, a rectangle containing an underline image is extracted, and the character string rectangle right above 
the extracted rectangle is marked as an underline rectangle (step S87). 

as Next, a point-counting process is performed to determine the probability of a title based on the features such as 

the position of a character string rectangle in the document, character size, whether or not it is a rectangle with a 
character box or an underline rectangle, etc. to extract one or more high-point character string rectangles as candidates 
for a title (step S88). Based on the result, the source and destination information about the document is extracted 
(steps S89 and S90). Thus, the title, destination, and source information is extracted as a candidate for management 

40 information. 

In this example, in the image 76, three candidates C4, C5, and C6 for a title and the destination and source 
information are extracted. The user interface 78 outputs these data in order from the highest priority and presents them 
to the user. The user selects one of them by pointing to it using a mouse when an appropriate candidate is presented 
as management information. Unless an appropriate is presented, the candidate for the management information can 
45 be corrected by explicitly specifying another character string rectangle in the pointing process. 

Next, the usage of the extracted management information is explained by referring to FIGs. 26 through 28 Con- 
ventionally, only keywords or character codes of document names s etc. are used as management information for use 
in handling images. However, the electronic filing system provided with the management information extraction appa- 
ratus according to the present invention has the function of storing a part of a document image as an index in addition 
to character codes. Thus, retrieval using an image can be effective when the reliability of character codes is low. 

The system according to the present invention allows the user to select the storing method for management infor- 
mation using a character code or an image code. Based on the selection result, selected data is stored as management 
information. When an image is retrieved, the system instructs the user to select a method of retrieving management 
information, and the management information is retrieved using a character code or an image based on the selection 
result. The system also has the function of simply browsing the stored character codes or images. 

FIG. 26 is a flowchart showing the image information storing process. When the process starts, the electronic filing 
system first receives a document image (step SI 01), computes the position of the management information in the 
process as shown in FIG. 4, and extracts a character string of management information (step S102). Then, the system 
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instructs the user to select a method of storing management information for the extracted character string (step S1 03). 

The storing method is followed in a character recognition mode in which a character string is character-recognized 
and converted into a character code or in an image mode in which a character string is not character-recognized but 
stored as an image. If the user selects the character recognition mode, characters are recognized (step S104), and a 

5 storing method is selected depending on the reliability of the recognition result (step S105). 

The method of computing the reliability of character recognition is, for example, to use the technology disclosed 
in the "Character Recognition Method and Apparatus" according to a former application (Japanese Patent Application 
H8-223720). According to this technology, the system first computes a probability parameter from the distance value 
between the character code obtained as a recognition result and an input character pattern, and generates a conversion 

10 table for use in converting the probability parameter into a correct recognition probability using a set of character 
patterns and correctly-recognized codes. Based on the conversion table, the correct recognition probability to the 
probability parameter is obtained, and the correct recognition probability is used as the reliability of the recognition 
result. 

If the reliability of character recognition is lower than a predetermined threshold, then the user is notified that an 
*s image is stored, and the image of the character string as well as its character code is stored as management information 
(step SI 06), thereby terminating the process. If the reliability is equal to or higher than the predetermined threshold, 
then the character code is stored as management information (step S107), thereby terminating the process. 

If the user selects the image mode, then an image of a character string is stored as management information (step 
SI 08), thereby terminating the process. In step S103, it is possible to enter a mode in which both a character code 
20 and an image code are stored as an alternative storing method. Assuming that the information about the distance value 
between the character code obtained as a recognition result and the input character pattern indicates the reliability in 
step S105, it can be determined that the smaller the distance value is, the higher the reliability becomes. 

FIG. 27 shows an example of a storage table for storing management information. The management information 
storage table has a character code storage area, an image storage area, and a type flag area indicating whether 
25 information is stored in a character code or an image code. 

For example, the type flag 0 indicates that only the character code is stored. The type flag 1 indicates that only 
the image code is stored. The type flag 2 indicates that both the character code and image code are stored. 

FIG. 28 is a flowchart showing the management information retrieving process for retrieving such management 
information. When the process starts, the electronic filing system first instructs the user to select a method of retrieving 
30 management information (step S1 1 1 ). The retrieving method is followed in three modes, that is, a mode using character 
codes, a mode using images, and a mode displaying a list of character codes and images to be browsed by a user. 

When a user selects character code retrieval, management information is retrieved using a character code (step 
S112); When a user selects image retrieval, management information is retrieved using an image (step S113). When 
a user selects browsing, a list of character codes and images stored in the management information storage table is 
3S displayed (step S114). After the selection, the process terminates. 

When information is retrieved using images in step S11 3, the user is instructed to designate a specific image file 
or an appropriate image is selected and displayed. Then, the user is instructed to designate a specific rectangular 
portion as a retrieval key, and the user-designated portion of the image is compared with the image stored in the 
management information storage table. The comparison between images is made using a well-known template match- 
40 ing described in, for example, "Digital Image Process for Recognizing Image [I]" by Jun'ichiro Toriwaki, published by 
Shokodo. 

In the template matching, the designated potion of the image is used as a model (template) with which the image 
in each management information storage table is compared in computing the similarity between them to obtain man- 
agement information indicating the highest similarity or indicating similarity higher than a predetermined value. A doc- 
45 ument image corresponding to the obtained management information is displayed as a retrieval result. 

According to such an electronic filing system, a character string of management information is not only stored/ 
retrieved using character codes, but also can be stored/retrieved using images. Therefore, characters which are difficult 
to be correctly recognized such as textured characters, designed fonts, logos, etc. can be processed as management 
information. 

50 In steps S15 and S16 in FIG. 17, the cross-ratio DP matching is used to identify a table-formatted document form 

(structure of format). However, the detailed identification can be performed by any other of optional methods. 

In another well-known automatic form identifying method, the feature of a known table-formatted document form 
is entered as a model in the dictionary 31 . When an image of an unknown table-formatted document is input, the feature 
is computed from the image, it is compared with the model in the dictionary using a model matching method, and the 
55 model indicating the highest similarity is obtained. 

In a model matching method, the entire table is first normalized, the position of the central point of each rectangular 
cell is computed, and the model having a central point at almost the same position as the above described rectangular 
cell is voted. The model which obtains the largest number of votes is defined as the optimum model. The normalization 
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Next; rough information is extracted about each of the horizontal and vertical ruled lines (step S124). Rough in- 
formation refers to relative values indicating the length and position of a ruled line to the entire table, and is represented 
by a set of three integers. And, considering all combinations of two ruled lines in each of the vertical and horizontal 
directions, detailed information relating to each combination is extracted (step S1 25). The detailed information express- 
s es the relative relation in length and position between two ruled lines. 

The rough information and detailed information about a model to be compared with an input image are preliminarily 
extracted and stored in the layout dictionary 31. Therefore, the rough information and detailed information about the 
input image are compared with those about the model for a model matching (step S126). The optimum model is output 
as an identification result (step S127), thereby terminating the process. 
10 Next, the processes in steps S1 24, S1 25, S1 26, and S1 27 are described in detail by referring to FIGs. 31 through 41 . 

In step S124, the reference width W ( reference height H, reference x coordinate xO, and reference y coordinate 
yO are obtained as a preprocess prior to obtaining the rough information. First, the maximum length is obtained for 
horizontal ruled lines. Among the horizontal ruled lines indicating a length ratio higher than or equal to a predetermined 
threshold (for example, 0.8), the first and the last ruled lines are obtained as reference contour horizontal ruled lines. 
*s The maximum length is obtained also for vertical lines. As in the case of horizontal ruled lines, two reference 

contour vertical ruled lines are obtained. Then, with respect to a circumscribing rectangle of the obtained four reference 
contour ruled lines, a reference width W, a reference height H, and a reference point at the upper left vertex having 
the reference coordinates (xO. yO) are determined. 

For example, in the table-formatted document as shown in FIG. 31 , horizontal ruled lines 81 and 82 are extracted 
20 as reference contour horizontal ruled lines, and vertical ruled lines 83 and 84 are extracted as reference contour vertical 
ruled lines. The width of the circumscribing rectangle of the reference contour ruled lines is regarded as the reference 
width W and its height as the reference height H. The coordinates of the upper left vertex 85 of the circumscribing 
rectangle are regarded as the reference coordinates (xO, yO). 

Short ruled lines such as the horizontal ruled lines 86 and 87 can be removed from candidates for the reference 
25 contour ruled lines by selecting reference contour ruled lines from among the ruled lines longer than a length computed 
from the maximum length. 

The above described reference width W, height H, and coordinates (xO, yO) can also be obtained as follows. First, 
coordinate values vmaxx, vminx, vmaxy, vminy, hmaxx, hminx, hmaxy, hminy are defined as the candidates for refer- 
ence coordinates as follows. 

30 

vamxx = (maximum value of x coordinate of lower right vertex of vertical ruled line rectangle) 
vminx = (minimum value of x coordinate of upper left vertex of vertical ruled line rectangle) 
vmaxy = (maximum value of y coordinate of lower right vertex of vertical ruled line rectangle) 
vminy = (minimum value of y coordinate of upper left vertex of vertical ruled line rectangle) 
35 hamxx = (maximum value of x coordinate of lower right vertex of horizontal ruled line rectangle) 

hminx = (minimum value of x coordinate of upper left vertex of horizontal ruled line rectangle) 
hmaxy = (maximum value of y coordinate of lower right vertex of horizontal ruled line rectangle) 
hminy = (minimum value of y coordinate of upper left vertex of horizontal ruled line rectangle) ...(10) 

40 Next, according to these coordinate values, candidates for a reference width and a reference height are obtained 

by the following equations. 



W1 


= vmaxx 


- vminx 


W2 


= hmaxx 


- hminx 


H1 


= hmaxy 


- hminy 


H2 


= vmaxy 


- vminy 



The reference width W is obtained by 

W=max{W1, W2) (12) 



where xO = vminx when W = W1 and xO = hminx when W = W2. 
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The reference width H is obtained by 



H = min{H1 ) H2} ( 
where yO = hminy when H = H1 andyO = vminy when H = H2 

lengthl = integer portion of [(Ll/W) x 100] 

twist - integer portion of [ (U i _ x0 )/ W ) x 100] (14) 

position = integer portion of [((yl - y0)/H) x 10Q] 

lengthl = integer portion of [(Ll/H) x 100] 

twist - integer portion of [ ((y i - y0)/H) x 1Q0J (15) 

position = integer portion of C ((xl - x0 )/W) x 100] 



dw = x2 - x1 
dh = y2 - y1 



(16) 
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located under the center of the ruled line rectangle 93, dh is a positive value. If the center of the ruled line rectangle 
94 is located above the ruled line rectangle 93, dh is a negative value. 

The above described three features Iength2, differ, and height are computed by the following equation. 

5 

length2 = L2/L1 
differ = dw/Ll (17-1) 
10 height = dh/Ll 



Similarly, all combinations of two vertical ruled lines are extracted. In each combination, the length of one ruled 
line rectangle 95 (a higher sorting order) is L1, the central coordinates of the rectangle 95 are (x1, y1), the length of 
1& the other ruled line rectangle 96 (a lower sorting order) is L2, and the central coordinates of the rectangle 96 are (x2, 
y2) as shown in FIG. 35. Then, dw and dh are obtained by equation (16), and detailed information Iength2, differ, and 
height are computed by the following equation. 

20 length2 = L2/L1 

differ = dh/Ll (17-2) 
height « dw/Ll 

25 

In equation (17-2) compared with equation (17-1), the definitions of differ and height are reversed. Then, in step 
SI 26, the similarity of a form is computed by comparing the rough information and detailed information about an input 
image with those about each model. The comparison is made separately for horizontal ruled lines and vertical ruled 
30 lines. 

FIG. 36 is a flowchart showing such a model matching process. When the process starts, the management infor- 
mation extraction apparatus first generates a p x m table shown in FIG. 37 with p as the number of horizontal ruled 
lines of an input image of an unknown document and m as the number of horizontal ruled lines of a model (step S131). 

In this example, p = 12, m = 15 : and the row and column numbers of the table begin with 0. The element (item) of 
35 the j-th column in the i-th row in the table is data indicating the correspondence relation between the i-th ruled line of 
the input image and the j-th ruled line of the model. Such a table is hereinafter referred to as a matching table. 

Then, it is determined, according to the rough information, whether or not the i-th horizontal ruled line IP(i) of an 
input image corresponds to the j-th horizontal ruled line MO(j) of a model. If there is a possibility that they correspond 
to each other, a node is allotted to the element at the j-th column in the i-th row in the matching table (step S132) 
40 Thus, a combination of the horizontal ruled line IP(i) and the horizontal ruled line MO(j) is described on the matching 
table. At this time, the condition of the possibility of correspondence is not strictly set, but allows one ruled line to 
correspond to a plurality of ruled lines. 

In this example, the rough information (length 1 , twist, and position) of the ruled line IP(i) is set as (ipt, ipt, and ipp) 
respectively, and the rough information of the ruled line MO(j) is set as (mol, mot, and mop) respectively. When the 
difference between the corresponding values is smaller than a predetermined value, it is determined that the ruled line 
IP(i) can correspond to the ruled line MO(j). 

A practical condition for the possibility is set by the following equation. 



ipl 


- mol | 


i < 


p 




ipt 


- mot 


i < 


p 


(18) 


ipp 


- mop | 


i < 


a 





55 

where parameters a and p are thresholds which respectively depend on the number of horizontal ruled lines and 
the number of vertical ruled lines in the table. 
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These parameters aand p which depend on the number of ruled lines are positive integers. Thesmallerthe number 
of ruled l.nes is, the larger values they indicate. The larger the number of ruled lines is, the smaller values they indicate 
At this time, the condition of inequalities (18) extends the range of a search in a matching process if the density of the 
ruled Imes in the table is low, but reduces the range of a search in a matching process if the density of the ruled lines 
is high. The parameters aand p can be defined, for example, as functions simply decreasing depending on the number 
of horizontal and vertical ruled lines as shown in FIG. 38. 

Thus, the similarity between an input image and a model in relative feature to the outline portion of a table can be 
extracted by representing by a node the correspondence relation between ruled lines similar in rough information 

Next, according to the detailed information, arranged nodes are searched for a combination of those satisfying a 
predetermined relationship, that is, those compatible with each other (step S133), and the compatible nodes are re- 
garded as belonging to the same group and connected with each other through a path. 

When node.n(i, j) at the j-th column in the i-th row and node n(k, I) at the l-th column in the k-th row satisfy the 
predetermined relationship, it indicates that the relationship between the i-th ruled line and the k-th ruled line of an 
input image is proportional to the relationship between the j-th ruled line and the I-th ruled line of a model That is 
when the i-th ruled line of an input image overlaps the j-th ruled line of a model, the k-th ruled line of an input imaqe 
overlaps the 1-th ruled line of a model. 

Connecting these nodes through a path makes it possible to classify the nodes into several groups The larger the 
number of nodes a group contains, the higher the similarity between an input document and a model the group repre- 
sents. Therefore, the similarity computation can be effectively performed in a model matching process on such a qroup 
as contains a larger number of nodes. 

When a node compatible with a specified node is searched for, a search is always performed with the nodes in an 
area obliquely below and to the right of the specified node to improve the efficiency of the process. Thus, a clique as 
shown m FIG. 29 is not generated, and a path connecting a large number nodes can be obtained at a high speed A 
practical process of generating a path is described later. 

Then, consistent combinations of paths are obtained from among the obtained set of paths, and are searched for 
the one containing the largest number of nodes (step S134). The detected combination of paths is defined as the 
optimum path set. A consistent combination of paths indicates that the ranges of a set of ruled lines corresponding to 
the nodes in respective paths do not overlap each other. 

In the matching table shown in FIG. 37, two cases are considered in which the ranges of two ruled line sets overlap 
each other. One is the case, as shown in FIG. 39, that a sequence relationship is reversed between an input image 
and a model. The other is the case, as shown in FIG. 40, that two or more ruled lines correspond to a ruled line 

In the matching table shown in FIG. 39, the range of the ruled lines on the model side belonging toa group indicated 
by solid lines is considered to span from the 0th to the 9th ruled lines. The range of the ruled lines on the model side 
belonging to a group indicated by broken lines is considered to span from the 7th to the 8th ruled lines Therefore the 
ranges of the two ruled line sets overlap each other. Similarly, in FIG. 40, the range of the ruled line sets of the groups 
indicated by solid lines and broken lines overlap on the model side. 

In the optimum path set containing no inconsistent combinations of paths, the ranges of ruled line sets do not 
overlap each other on either side of an input image or a model as shown in FIG. 41 . Thus, the correspondence relation 
among the ruled lines represented by nodes contained in the optimum path set is referred to as the optimum corre- 
spondence. 

Next, assuming that the number of horizontal ruled lines of an input image is ph. the number of horizontal ruled 
lines of a model is mh, and the number of nodes contained in the optimum path set for the horizontal ruled lines is 
maxh, the similarity SH between the horizontal ruled lines of the input image and the model is computed by the followinq 
equation (step S1 35). a 



SH = maxh/ph + maxh/mh (1 gj 

The similarity SH indicates the sum of the ratio of ruled lines corresponding to the optimum path set in the ruled 
lines of the input image and the ratio of rules lines corresponding to the optimum path set in the ruled lines of the model 
Normally, the more similar the features of the input image are to the features of the model, the larger the sum becomes. 

The management information extraction apparatus processes the vertical ruled lines as in the processes performed 
on the horizontal ruled lines in steps S1 31 through S1 35. Assuming that the number of vertical ruled lines of an input 
image is pv, the number of vertical ruled lines of a model is mv, and the number of nodes contained in the optimum 
path set for the vertical ruled lines is maxv, the similarity SV between the vertical ruled lines of the input image and the 
model is computed by the following equation. 
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SV = maxv/pv + maxv/mv (20) 

Finally, the similarity S of the ruled lines between the input image and the model is computed by the following 
5 equation using the SH and SV, thereby terminating the model matching process. 

S = SH + SV (21) 

io For example, the similarity between a model and an input image is computed by performing the above described 

matching process using each candidate of table obtained by the rough classification as the model. In step S127, the 
model indicating the highest similarity is output as the optimum model. Thus, a dictionary form corresponding to the 
input image can be obtained. 

Next, the node arranging process, the path generating process, and the optimum path set determining process 
is shown in FIG. 36 are described further in detail by referring to FIGs. 42 through 48. 

FIG. 42 is a flowchart showing the node arranging process in step S1 32 shown in FIG. 36. In FIG. 42, the rough 
information length 1 , twist, position of the i-th ruled line of an input image is respectively represented by ipl(i), ipt(i), ipp 
(i), and the rough information about the j-th ruled line of a model is represented by-mol(j), mot(j), and mop(j ). 

The data indicating the element at the j-th column in the i-th row on the matching table is represented by sign (i, 
20 j). When sign (i, j) = 0, a node is not set at a corresponding element. When sign (i, j) ) = 1 , a node is set at the corre- 
sponding element. 

When the process starts, the management information extraction apparatus first determines whether or not the 
condition I ipp(i) - mop(j) I < a is fulfilled (step S141 ). Unless the condition is fulfilled, sign (i, j) is set to 0 (step S1 42), 
thereby terminating the process. 

25 |f the condition in step S141 is fulfilled, then the management information extraction apparatus determines whether 

or not the condition I ipt(i) - mot(j) I < 0 is fulfilled (step S143). Unless the condition is fulfilled, sign (i, j) is set to 0 (step 
S144), thereby terminating the process. 

If the condition in step S143 is fulfilled, then the management information extraction apparatus determines whether 
or not the condition I ipl(i) - mol(j) ) I < p is fulfilled (step S145). Unless the condition is fulfilled, sign (i, j) is set to 0 

30 (step S1 46), thereby terminating the process. If the condition in step S145 is fulfilled, then sign (i, j) is set to 1 , and the 
node is set at the j-th column in the i-th row (step S147), thereby terminating the process. 

The above described processes are performed for all positions (i, j) of the matching table so that nodes indicating 
the correspondence between two ruled lines whose rough information is similar to each other are set at the position 
corresponding to the ruled lines. 

os FIGs. 43 and 44 are flowcharts showing the path generating process in step S133 shown in FIG. 36. When the 

process starts, the management information extraction apparatus first performs an initializing process (step S151 
shown in FIG. 43). In this process, the position (i, j) of the element at which a node is set on the matching table is 
stored as a node string in a storage area in the memory. The nodes are arranged in an ascending order of row numbers 
i in the storage area. When nodes are assigned the same row number i, they are arranged in an ascending order of 

40 column numbers j. Each node in a node string is assigned a flag indicating whether or not it is connected through a path. 

For example, the node string in the storage area corresponding to the matching table shown in FIG. 37 is as shown 
in FIG. 45. In the storage area shown in FIG. 45, the positions (0, 0), (1, 0), (1, 1), (2, 0), (11, 14) of the nodes on 
the matching table are sequentially stored, and the values of the flags are initialized to 1. If the value of a flag is 1, it 
indicates that a corresponding node is not yet connected through a path. 

45 Next, the leading data in the storage area is accessed (step S152), and i and j are read from the access point to 

mark the element on the matching table corresponding to the position (step S153). The node of the marked element 
is defined as a reference node with "sign" of the element set to 0 and the corresponding flag in the storage area set 
toO (steps 154). 

Then, the value of the control variable "count" is set to 0 (step S155), and it is checked whether or not the marked 
so element corresponds to the last column of the matching table or whether or not the value of "count - has reached a 
predetermined constant h (step S156). Unless these conditions are fulfilled, the marked position is moved by one 
column to the right (step S157), and it is checked whether or not the position of the mark corresponds to the last row 
(step S1 58). 

If the position of the mark corresponds to the last row, then 1 is added to the value of "count" (step SI 59), and the 
ss processes in and after step S156 are repeated. Unless the position of the mark corresponds to the last row, the mark 
is moved by one row downward (step S160), and it is checked whether "sign" of the marked element is 0 or 1 (step S1 61 ). 

If the value is 0, no nodes are set at the position of the mark. Therefore, the processes in and after step S1 58 are 
repeated to check another element in the column. If "sign" indicates 1 , then a node is set at the position of the mark, 
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and it is determined whether or not the node can be connected to the reference node through a path \s\ep Sf62) It is 
determined using the detailed information, that is, Iength2, differ, and height, between the ruled lines corresponding to 
the nodes, whether or not the two nodes can be connected through a path. 

For example, as shown in FIG. 46, the detailed information indicating the relationship between the ruled line 101 
corresponding to the reference node and the ruled line 102 corresponding to the node to be determined in the input 
image is set as Iength2 = L2/L1 , differ = dw/L1 , and height = dh/L1 . 

In the model, the detailed information indicating the relationship between the ruled line 103 corresponding to the 
reference node and the ruled line 1 04 corresponding to the node to be determined is set as Iength2 = L27L1 ' differ = 
dw'/L1 \ and height = dh'/L1 

At this time, if the following inequalities are fulfilled using the empirical thresholds €1, G2, and e3, the reference 
node is compatible with the node to be determined and they can be connected to each other through a path. 

\L2/L1-L2VL1 1<e1 
Idw/LI-dw'/LI'l < e2 

\dhfl-1-dh'/L1'\<t3 (22) 

20 By setting thresholds <=1 , e2, and e3 sufficiently small, inequalities (22) indicate that the graphics comprising the 

ruled lines 101 and 102 are similar to the graphics comprising the ruled lines 103 and 104. If these ruled line graphics 
are similar to each other, then there is high possibility that the ruled line 102 corresponds to the ruled line 104 when 
the ruled line 101 corresponds to the ruled line 103. Thus, these two nodes are regarded as being compatible with 
each other. 

2S Thus, under such a similarity condition for setting a path, the number of determinations of compatibility between 

nodes can be reduced. For example, if node 97 is a reference node in the matching table shown in FIG. 37, then node 
98 is considered to be compatible with node 99 under the condition that node 97 is compatible with node 98 and node 
97 is compatible with node 99. 

If it is determined that node 99 can be connected to the reference node 97 through a path, then it is determined 
that node 99 can also be connected through a path to node 98 already connected to the reference node 97 throuoh a 
path. 3 

When the node positioned at the mark cannot be connected to the reference node through a path, the processes 
in and after step S1 58 are repeated to check another node in the same column. If they can be connected to each other 
through a path, then the flag in the storage area corresponding to the node positioned at the mark is rewritten to 0 
(step S163). Thus, it is recorded that the node is connected to the reference node or a node immediately before the 
node on the path. Then, the processes in and after step SI 56 are repeated to check the node of the next column 

In the processes in and after step S1 56, the position of the mark is moved forward by one column and then by one 
row to search for the element obliquely below to the right A path can be sequentially extended in a direction obliquely 
below and to the right in the matching table by repeating the above described processes. 

If the condition in step S156 is fulfilled, it is checked whether or not the number of hits of the paths extending from 
the reference node is two or more (step S164 shown in FIG. 44). The number of hits refers to the number of nodes on 
the path. If the number of nodes on the path is two or more, then the path is formally registered and the information 
about the nodes on the path is stored (step S165). If the number of the nodes on the path is 1 , then it indicates there 
are no paths extended from the reference node to any other nodes. As a result, the path is not registered. 

Next, it is checked whether or not there is data remaining unaccessed in the storage area (step S166). If there is 
the data, the access point in the storage area is moved forward by one (step S167), and the value of the flag at the 
position is checked (step S168). If the flag indicates 0 ( then the node at the position has already been added to the 
path and the next data is checked by repeating the processes in and after step S166. 

If the flag indicates 1 , then the node at the position has not been added to the path. Therefore, the processes in 
and after step S153 are repeated. Thus, a new path is generated with the node defined as a new reference node. In 
step in S166, if the access point in the storage area reaches the trailing point, then the process terminates. 

FIG. 47 is a flowchart showing the optimum path set determining process in step S134 shown in FIG. 36. In this 
process, a matching table of p rows and in columns of horizontal ruled lines or vertical ruled lines is handled using the 
array score (i) (i = 0 : 1, 2, .... m) indicating the number of nodes of a provisional path set for the optimum path set and 
ss the array rireki (i) (i = 0, 1, 2, m) indicating the row number. 

When the process starts, the management information extraction apparatus first sets the score (m) indicating the 
initial value of the number of nodes of the optimum path set to 0, and sets the rireki (m) indicating the initial value of 
the row number to p-1 (step S171 ). 
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Next, the variable i indicating the column number is set to m-1 (step S172). and in the registered paths, a set of 
paths including the upper left node corresponding to the column number i as a starting point, is set as Path (i) (step 
S173). Then, score (i) is set to equal score (i + 1), and rireki (i) is set to equal rireki (i + 1 ) (step S174). The score (i) 
indicates the number of nodes of the provisional path set in the range from the i-th column to the last column (m-1-th 
5 column). 

Next, one of the paths is obtained from the set Path (i), and score (i) is updated according to the information about 
its node (step S175). Than, it is checked whether or not a path remains in the set Path (i) (step S176). If yes, the next 
path is obtained and the computation of score (i) is repeated. 

When the computation of all paths in the set Path (i) is completed, it is determined whether or not i has reached 
10 o (step S177). If i is equal to or larger than 1, i is set to i-1 (step S178), and the processes in and after step S173 are 
repeated. When i has reached 0, the obtained value of score (0) is defined as the number of nodes of the final optimum 
path set (step S179), thereby terminating the process. 

The value of score (0) obtained from the matching table of horizontal ruled lines is used as maxh in equation (1 9) 
in computing the similarity. The value of score (0) obtained from the matching table of vertical ruled lines is used as 
is maxv in equation (20) in computing the similarity. 

Next, the node number updating process in step S175 shown in FIG. 47 is described by referring to FIG. 48. When 
the node number updating process starts, the management information extraction apparatus first retrieves one of the 
paths from the set Path (i). The row number of the starting point of the path is set as sg, and the column number and 
the row number of the node at the lower right ending point of the path, are respectively set as er and eg. The number 
20 of nodes contained in the path is set as "hits" (step S181). 

For example, in the matching table shown in FIG. 37, Path (11) contains paths p1 and p2 in the area obliquely 
below to the right when i = 11 . For path p1 , the values sg, er. and eg are respectively 8, 14, and 11 . For path p2, the 
values sg, er, and eg are respectively 6, 12, and 7. 

Next, the variable j indicating the column number is set to er + 1 (step S182), and the values of eg is compared 
25 with rireki (j) (step S183). In this case, if the value of eg is larger than rireki (j), it is determined whether or not score (j) ■ 
+ hits > score (i) is fulfilled, or both score (j) + hits = score (i) and eg < rireki (i) are fulfilled (step S184). 

If either of the above described conditions is fulfilled, score (i) is set as score (j) + hits, and rireki (i) is set as eg 
(step S185), thereby terminating the process. 

If eg is equal to or smaller than rireki (j) in step S183 or neither of the conditions in step S184 is fulfilled, then j is 
30 set to j + 1 (step S1 86), and j is compared with m (step S187). If j is equal to or smaller than m, then the processes in 
and after step S183 are repeated. If j exceeds m, then the process terminates. 

Thus, a new provisional path set for the optimum path set is extracted from sets each obtained by adding one path 
to the provisional path set determination the immediately previous process, and the number of its nodes is recorded 
in the score (i). The number of nodes of the provisional path set for the optimum path set in the range from the i-th 
35 column to the last column is obtained by repeating these processes on all paths of Path (i). 

For example, in FIG. 37, two combinations, that is, path p1 only and the combination of paths p2 and p3, can be 
considered as the combination of consistent paths in the range from the 11th column to the last column. Since the 
number of nodes of these combinations is 4 in either case, score (11) equals 4. 

The above described form identifying process is applied not only to the management information extraction appa- 
40 rat us but also any image recognition apparatus such as a document recognition apparatus, a drawing reading appa- 
ratus, etc., and is effective in identifying the structure of ruled lines of an arbitrary image. 

In the form identifying process according to the present embodiment, the relationship among ruled lines is used 
as a feature. Therefore, a stable and correct identification can be attained even if a part of ruled lines cannot be 
successfully extracted due to a break in a line or noises, etc. when the structure of the ruled lines is extracted from an 
45 input table-formatted document and is matched with the form of the entered table-formatted document. Especially, a 
high robustness can be obtained by setting a broad condition for the arrangement of nodes to reduce the deterioration 
of the precision in extracting contour ruled lines, which are likely to be unstably extracted because of the influence of 
noise. 

Stable and correct identification can be attained in altering a form by adding or deleting one row if the optimum 
50 path set is obtained as a combination of one or more paths. Furthermore, the number of compatibility checking proc- 
esses can be reduced by setting a transitional compatibility condition relating two nodes, thereby performing a high- 
speed identifying process. 

According to the present invention, the form of an image of a table-formatted document, etc. and the position of 
management information can be automatically learned and stored in the dictionary. Therefore, according to the stored 
55 information, the position of the management information in an arbitrary input image can be computed with a high 
precision. 

Particularly, since a feature which is stable to the fluctuation of image information is used, management information 
can be successfully extracted from a broken or distorted document image. Furthermore, the management information 
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can be extracted at a high speed because form learning and comparing processes are performed whilo candidates 
are progressively limited in two steps, that is, in rough classification and detailed identification, and the detailed iden- 
tification is performed in a one-dimensional matching using the feature of the outline form of a table. 

Additionally, since the management information is stored and retrieved using not only a character code but also 
an image itself, even difficult characters such as textured characters, etc. to be recognized, can be handled as man- 
agement information. 
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Claims 

1. A management information extraction apparatus comprising: 



computation means (22) for computing a position of management information contained in an arbitrary input 
image according to relative position information about a ruled line to an outline portion of a table area contained 
15 in the input image; and 

extraction means (25) for extracting the management information from the input image based on the position 
computed by said computation means (22). 

2. The management information extraction apparatus according to claim 1 , wherein 

20 said computation means (22) obtains, as information about the outline portion of the table area, at least one 

of a reference size of the table area and a position of a reference point around an outline of the table area. 

3. The management information extraction apparatus according to claim 1 , wherein 

said computation means (22) obtains, as information about the outline portion of the table area, positions of 
25 two or more reference points (52, 53, 54, and 55) around an outline of the table area, and computes the position 

of the management information according to position information relative to the two or more reference points (52, 
53, 54, and 55). 

4. The management information extraction apparatus according to claim 1 , 2 or 3, wherein 

50 said computation means (22) computes the position of the management information using as a feature of a 

structure of ruled lines at least one or more pieces of position information about an intersection between two ruled 
lines, a state of the intersection between two ruled lines, the number of intersections contained in the input image, 
and a frequency of a rectangular cell of a specific form encompassed by ruled lines. 

35 s. The management information extraction apparatus according to claim 4, wherein 

said computation means (22) obtains the feature of the structure of the ruled lines after distinguishing a case 
in which a ruled line is a solid line from a case in which a ruled line is a broken line. 

6. The management information extraction apparatus according to any preceding claim, wherein 

40 said computation means (22) computes the position of the management information using reliability in ex- 

tracting the ruled tine as a feature of a structure of ruled lines. 

7. The management information extraction apparatus according to any preceding claim, wherein 

said computation means (22) computes the position of the management information using, as a feature of 
45 a structure of ruled lines, a ratio of two or more distances between a plurality of intersections (xt , x2, x3, and x4) 

arranged on the ruled line. 

8. The management information extraction apparatus according to claim 7, wherein 

said computation means (22) extracts a sequence of the plurality of intersections (x1 , x2, x3, and x4) on 
so ruled lines from around an outline of the table area, obtains a feature vector using the ratio of the distances as an 

element corresponding to each of the ruled lines, and represents a feature of a form of the outline of the table area 
using the feature vector. 

9. The management information extraction apparatus according to any preceding claim, wherein 

55 said computation means (22) obtains a feature of a form of an outline of the table area in at least one of four 

directions, that is, a right, left, upward, and downward directions from outside the input image, and computes the 
position of the management information using the feature of the form of the outline. 



24 



EP 0 851 382 A2 

10. The management information extraction apparatus according to any preceding claim, further comprising: 

dictionary means (23, 31 , and 73) for storing a feature of a structure of ruled lines of one or more table forms, 
and the position information of the management information in each table form; and 

comparison means (24) for comparing the feature of the structure of ruled lines of the input image with the 
feature of the structure of ruled lines stored in said dictionary means (23, 31 , and 73), wherein 
said computation means (22) refers to the position information of the management information stored in said 
dictionary means (23) based on a comparison result from said comparison means (24), and computes the 
position of the management information of the input image. 

The management information extraction apparatus according to claim 1 0, wherein 

said comparison means (24) limits candidates of table forms to be compared using the feature of the structure 
of ruled lines for rough classification, makes a comparison using the feature of the structure of ruled lines for 
detailed identification, and determines a table form corresponding to the input image. 

The management information extraction apparatus according to claim 11, wherein 

said comparison means (24) determines the table form corresponding to the input image by a dynamic pro- 
gramming matching process. 

20 13. The management information extraction apparatus according to claim 10, 11 or 12, wherein 

said dictionary means (23, 31 , and 73) ) stores position information of a rectangular cell (51 ) encompassing 
the management information as the position information of the management informatbn in each table form. 

14. The management information extraction apparatus according to claim 13, wherein 
25 said dictionary means (23, 31 , and 73) stores one or more difference vectors between one or more vertexes 

(56, 57, 58, and 59) of the rectangular cell (51 ) and one or more vertexes (52, 53, 54, and 55) of a table containing 
the rectangular cell (51) as the position information of the rectangular cell (51). 

The management information extraction apparatus according to claim 14, wherein 

said computation means (22) obtains a stable vertex of the table area of the input image according to the 
comparison result, and computes the position of the management information of the input image using a difference 
vector from the stable vertex. 

The management information extraction apparatus according to claim 15 : wherein 

said dictionary means (23, 31 , and 73) further stores a size of the rectangular cell (51 ); and 
said computation means (22) computes the position of the management information of the input image from 
a rectangular cell which has a size corresponding to the size of the rectangular cell (51 ) and is located near 
a position specified by the difference vector. 

The management information extraction apparatus according to claim 13, 14, 15, or 16, wherein 

said dictionary means (23, 31 , and 73) further stores a size of each table of the table forms; and 
said computation means (22) computes a size ratio from a size of the table area of the input image and a size 
of a corresponding table in the dictionary means (23 s 31 , and 73), and computes the position of the manage- 
ment information of the input image based on the size ratio. 

18. The management information extraction apparatus according to any of claims 10 to 17, wherein 

said comparison means (24) obtains a plurality of possible combinations of ruled lines extracted from the 
so input image and corresponding ruled lines contained in information of said dictionary means (23, 31 , and 73), 

extracts a group of two or more compatible combinations among the plurality of combinations, and compares the 
form of the input image with each table form according to information about the combinations in the group. 

19. A management information extraction apparatus, comprising: 

55 

dictionary means (23, 31 , and 73) for storing a feature of a structure of ruled lines in one or more table forms 
and position information of management information in each of the table forms; 

comparison means (24) for comparing a feature of a structure of ruled lines of an input image with the feature 
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of the structure of the ruled lines stored in said dictionary means (23, 31 . and 73)- 

.-tingLm^^^^^ 

SS^Z&T^ the posi,ion of ,he mana9emem in,orma,ion specified by a user in said 

20 ' "^Zr" 1 ? f0rmati0n , extrac,ion a PP arat "s according to claim 1 9, further comprising- 

^^z^i^ i~ = t --~ nt infoL,ion au ™ 

21. The management information extraction apparatus according to claim 19 or 20. wherein 

22. An image accumulation apparatus comprising: 

23. The image accumulation apparatus according to claim 22, further comprising: 

24. Th. image accumulalion apparata. zoning to claim 22. further comprising: 

25. The image accumulation apparatus according to claim 24 wherein 

26. A form identification apparatus comprising: 

storage means (26) for storing ruled line information for a table form- 

anT^^' 3 " 0 " T SanS (27) for ° btaining 3 p,ura "* ° f P° ssible combinations of ruled lines extracted from 
and o t 96 COrres P° ndin 9 "** contained in ruled line information of said storages 26? 

27. Th« form UanWicaUcri apparatus according to claim 26, further comprising: 

28. The form identification apparatus according to claim 26 or 27, wherein 
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• said group generation means (27) compares a relative value of a feature of an outline portion of the input 
image to a feature of each ruled line with a relative value of a corresponding feature of the table form, determines 
a possibility of correspondence between a ruled line of the input image with a ruled line of the table form, and 
generates a possible combination of ruled lines. 

5 

29. The form identification apparatus according to claim 26 or 27, wherein 

said group generation means (27) compares a relative relationship between ruled lines contained in the input 
image with a relative relationship between ruled lines of the table form, and determines whether or not the two or 
more combinations are compatible with each other. 

10 

30. The form identification apparatus according to claim 26 or 27, wherein 

said group generation means (27) includes: 

table generation means (41 and 42) for generating a matching table by arranging the ruled lines of the input 
15 image in a first direction, arranging the ruled lines of the table form in a second direction, and defining a 

combination of an i-th ruled line of the input image and a j-th ruled line of the table form as a node at a position 
of an element (i, j); and 

path generation means (41 and 42) for connecting two nodes corresponding to a compatible combination with 
a path on the matching table. 

20 

31. The form identification apparatus according to claim 30, wherein 

said path generation means searches for a next node compatible with the node at the position of the element 
(i, j) within a range of an element (x, y) where x > i and y > j, and sets a search range similar to the range of the 
element (x, y) based on the next node when the next node is obtained. 

25 

32. A computer-readable storage medium (42, 45, and 50) used to direct a computer to perform the functions of: 

computing a position of management information contained in an arbitrary input image according to relative 
position information about a ruled line to an outline portion of a table area contained in the input image; and 
30 extracting the management information from the input image based on the computed position. 

33. A. computer-readable storage medium (42, 45, and 50) used to direct a computer to perform the functions of: 

preliminarily entering a position of management information of one or more table forms specified by a user as 
35 position information; 

comparing a feature of a structure of ruled lines of an input image with a preliminarily stored feature of a 
structure of ruled lines of the one or more table forms; and 

referring to the position information based on a comparison result, and extracting the management information 
of the input image. 

40 

34. A computer-readable storage medium (42, 45, and 50) used to direct a computer to perform the functions of: 

storing image information as management information for an accumulated image; and 
retrieving the image information. 

45 

35. A computer-readable storage medium (42, 45, and 50) used to direct a computer to perform the functions of: 

obtaining a plurality of possible combinations of ruled lines extracted from an input image and corresponding 
ruled lines of a preliminarily stored table form; 
so extracting a group of two or more compatible combinations among the plurality of combinations in such a way 

that no combinations in another group can be contained; and 

comparing the input image with the table form according to information about the combinations contained in 
one or more extracted groups. 

55 36. A management information extracting method, comprising the steps of: 

computing a position of management information contained in an arbitrary input image according to relative 
position information about ruled lines to an outline portion of a table area contained in the input image; and 
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extracting the management information from the input image based on the computed position! 
\ A management information extracting method, comprising the steps of: 

preliminarily entering a position of management information of one or more table forms specified 



position information; 



38. An image accumulation method, comprising the steps of: 

storing image information as management information for an accumulated image" and 
retrieving the image information. 

39. A form identification method, comprising the steps of: 



comparing the input image with the table form according to information about the combinations contained ii 



one or more extracted groups. 
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